EÆcient Discovery of Concise Association Rules from Large Databases

نویسنده

  • Vikram Pudi
چکیده

Association rules are interesting correlations among attributes in a database. These rules have many applications in areas ranging from e-commerce to sports to census analysis to medical diagnosis. The discovery of association rules is an extremely computationally expensive task and it is therefore imperative to have fast scalable algorithms for mining these rules. In this thesis, we present eÆcient techniques for discovering association rules from large databases and for removing redundancy from these rules so as to improve the quality of output. We also handle growing databases. Speci cally, we present three new algorithms: (1) ARMOR: This algorithm discovers association rules from databases and requires at most two database scans. We empirically show its performance to be within a factor of two of an unachievable lower bound. (2) g-ARMOR: This is an extension to ARMOR that is designed to remove redundancy from association rules during the mining process. This is especially important because the number of association rules generated in typical mining operations runs into the tens of thousands. g-ARMOR results in an orders of magnitude reduction in the number of rules thereby making the mining output comprehensible to end users. (3) DELTA: This algorithm incrementally mines evolving databases. It utilizes previous mining results to eÆciently mine the current database after it has been updated with fresh data. It also handles situations where the mining speci cations over the current database di er from those used over the original database, a common occurrence in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Distributed Algorithm for Mining Fuzzy Association Rules

Data mining, also known as knowledge discovery in databases, is the process of discovery potentially useful, hidden knowledge or relations among data from large databases. An important topic in data mining research is concerned with the discovery of association rules. The majority of databases are distributed nowadays. In this paper is presented an algorithm for mining fuzzy association rules f...

متن کامل

Mining Multiple-Level Association Rules in Large Databases

ÐA top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the Apriori principle. A group of variant algorithms is proposed based on the ways of sharing intermediate results, with the relative performance tested and analyzed. The enforcement of different interestingness measurements to find more intere...

متن کامل

Discovery of Multiple-Level Association Rules from Large Databases

Discovery of association rules from large databases has been a focused topic recently in the research into database mining. Previous studies discover association rules at a single concept level, however, mining association rules at multiple concept levels may lead to nding more informative and re ned knowledge from data. In this paper, we study e cient methods for mining multiple-level associat...

متن کامل

بررسی کاربردهای داده کاوی در نظام سلامت

Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...

متن کامل

Concise Representations for Association Rules in Multi-level Datasets

Association rule mining plays an important role in knowledge and information discovery. Often for a dataset, a huge number of rules can be extracted, but many of them are redundant, especially in the case of multi-level datasets. Mining non-redundant rules is a promising approach to solve this problem. However, existing work (Pasquier et al. 2005, Xu & Li 2007) is only focused on single level d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003